PCOS analysis

Agnes Lorenzen, Cecille Hobbs, Freja E. Klippmann, Julie Dalgaard Petersen & Mille Rask Sander

Introduction

Background

  • Polycystic ovary syndrome (PCOS) is a syndrome documented in women in their menstruating ages

  • Documented symptoms are often; period pains, irregular periods, ovary related problems and hormone imbalance

  • Patients with PCOS often have problems with pregnancy and potential complication with/in pregnancy

  • However, it is still not verified what the cause of PCOS is.

Aim

The aim of this study is to examine a data set (found on Kaggle) of patients with and without PCOS. The data set has been made in India and data comes from 10 different hospitals.

Data handling approach

  • Raw data:
    541 observations divided into 45 variables

  • 01_load_data:
    Simply loads the data

  • 02_clean_data:

    • Fixing random cells and replacing them with NA
    • Rename & factorizing columns
    • Split dataframe into body and blood measurements
    • Removed empty column
  • 03_augment:
    • Unit changes ( inch to cm)

    • Rounding & grouping BMI

    • Change Blood type and cycles from numeric values to characters

    • Create new column for cycle/ pregnancy stage

    • Merging data frame into one file

# Rounding of BMI and dividing into categories
body_measurements <- body_measurements |>
  mutate(BMI = round(BMI, 1)) |> 
  mutate(BMI_class = case_when(
    BMI < 18.5 ~ "Underweight",
    BMI <= 18.5 | BMI < 25 ~ "Normal weight",
    BMI <= 25 | BMI < 30 ~ "Overweight",
    BMI >= 30 ~ "Obesity")) |>
  mutate(BMI_class = factor(BMI_class,
                            levels =  c("Underweight", 
                                        "Normal weight",
                                        "Overweight", 
                                        "Obesity"))) |>
  relocate(BMI_class, .after = BMI)

Descriptive analysis of data 1

Information on the PCOS dataset

Dimensions:

PCOS <- read_tsv(file = "../data/PCOS_merged.tsv")

PCOS_dim <- PCOS |>
  dim() |>
  tibble()|>
  rename("PCOS dimensions" = "dim(PCOS)")|>
  print()

Count of how many have PCOS:

PCOS |> 
  count(PCOS_diagnosis) |>
  as.tibble() |>
  print()

Age plot

Descriptive analysis of data 2

Information on the PCOS dataset:

Dimensions:

PCOS <- read_tsv(file = "../data/PCOS_merged.tsv")

PCOS_dim <- PCOS |>
  dim() |>
  tibble()|>
  rename("PCOS dimensions" = "dim(PCOS)")|>
  print()

Count of how many have PCOS:

PCOS |> 
  count(PCOS_diagnosis) |>
  as.tibble() |>
  print()

Plot of the ages:

Age plot

Analysis 1

Body measurement data analysis - Follicle number

In this analysis, we have been looking at the correlation between PCOS diagnosed patients, and what factors they potentially have in common from the body measurements data.

L

R

PCA of blood measurements

her

PCA of body measurements

her

Discussion

her

Conclusion

  • no significance